Skip to content

[cueweb/docs] Host and Allocation Management: Hosts monitor page - Initial version#2391

Merged
ramonfigueiredo merged 10 commits into
AcademySoftwareFoundation:masterfrom
ttpss930141011:dev-days/2292-hosts-monitor-page
Jun 6, 2026
Merged

[cueweb/docs] Host and Allocation Management: Hosts monitor page - Initial version#2391
ramonfigueiredo merged 10 commits into
AcademySoftwareFoundation:masterfrom
ttpss930141011:dev-days/2292-hosts-monitor-page

Conversation

@ttpss930141011

@ttpss930141011 ttpss930141011 commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Related Issues

This is the foundation for the other Host & Allocation Management issues that need a host listing surface to build on: #2293 (lock/unlock), #2294 (reboot), #2295 (host detail page), #2296 (host tag editor).

Summarize your change.

Adds a /hosts Monitor Hosts page to CueWeb: the web equivalent of CueGUI's CueCommander Monitor Hosts plugin (cuegui/cuegui/plugins/MonitorHostsPlugin.py, cuegui/cuegui/HostMonitorTree.py). CueWeb previously had no hosts page; the sidebar CueCommander > Monitor Hosts link 404'd. This provides the basic sortable, auto-refreshing table the issue asks for.

  • New page at app/hosts/page.tsx: loads hosts via the existing getHosts() proxy (/api/host/gethosts) and auto-refreshes every 30s, with loading (skeletons), empty, and error (inline message + Retry) states. Read-only.
  • Sortable, filterable table (app/hosts/columns.tsx) with columns: Name, State, Locked, NIMBY, Cores (Idle/Total), Memory (Idle/Total), Free /mcp. State/Locked reuse the existing Status badge. Numeric columns sort by their underlying value, not the formatted string.
  • Host type widened (app/utils/get_utils.ts) to include the fields the table needs. Memory/mcp values arrive from the gateway as KB-in-string, so there are small parse/format helpers (app/hosts/host_format_utils.ts) with Jest unit tests.
  • getHosts() (app/utils/get_utils.ts) now throws on a failed request instead of collapsing failures into [], so the page can tell a real Cuebot/gateway outage from an empty registry and actually render the inline error + Retry (the dashboard host widgets get the same fix for free).
  • SimpleDataTable (shared by jobs/layers/frames) gains an isHostsTable flag, mirroring the existing isFramesTable / isFramesLogTable flags: host-specific filter placeholder and empty-state copy, and no row context menu. It defaults to false, so existing callers are unchanged, verified by the existing test suite still passing.
  • Docs: Monitor Hosts section in the CueWeb user guide (with CueCommander-menu and light/dark page screenshots), a feature entry in the CueWeb overview (other-guides/cueweb.md), and a reference subsection plus the GetHosts endpoint / /api/host/gethosts proxy-route entries (reference/cueweb.md).

Idle/Total vs used/total: the issue text says "used/total", but the backend and CueGUI's Monitor Hosts both expose idle vs total, so the columns show and are labeled "(Idle/Total)" to stay faithful to the data source.

Scope: read-only by design. Host actions (lock/unlock, tag edit, reboot, NIMBY toggle) and server-side filtering belong to the sibling issues (#2293#2296) and are intentionally out of scope here.

Testing: added Jest unit tests for the formatting/sort helpers; full CueWeb suite passes (48/48). Manually verified against the local sandbox, the page lists the rqd hosts, sorting and the substring filter work, the table refreshes every 30s, and the existing jobs/layers/frames tables are unaffected by the SimpleDataTable change.

LLM usage disclosure

Hai Shun: @ttpss930141011

  • Claude (Opus) was used to explore the existing CueGUI patterns, draft unit tests and the PR request, and write the documentation.

Co-authored-by: Hai Shun o927416847@gmail.com
Co-authored-by: Ramon Figueiredo rfigueiredo@imageworks.com

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds a Hosts monitor page: Host type extensions, KB-format/idling helpers with tests, hostColumns and SimpleDataTable hosts mode, a HostsPage client that polls getHosts() every 30s with error handling, and user-guide documentation.

Changes

Host Monitoring Page Feature

Layer / File(s) Summary
Host type contract extension
cueweb/app/utils/get_utils.ts
Host type expanded with nimbyEnabled, cores, idleCores, and memory/free metrics typed as strings.
Host formatting utilities and tests
cueweb/app/hosts/host_format_utils.ts, cueweb/app/__tests__/hosts/host_format_utils.test.ts
Adds kbStringToNumber, kbStringToHuman, and idleRatio with defensive handling; tests cover parsing, formatting, invalid inputs, and divide-by-zero.
SimpleDataTable hosts variant support
cueweb/components/ui/simple-data-table.tsx
Adds isHostsTable prop and hosts-specific behavior: Server icon import, hosts filter placeholder/aria labels, default prop, disabled row context menus, hosts empty-state text, and omission of frame/layer menus when enabled.
Host table column definitions
cueweb/app/hosts/columns.tsx
Exports hostColumns configuring sortable headers and custom cell renderers using Status and host formatting utilities for cores, memory, and freeMcp.
Host monitoring page component
cueweb/app/hosts/page.tsx
HostsPage client component performs initial load and 30s polling via getHosts(), handles skeleton, inline retry on initial failure, preserves cached rows on poll failure, and renders SimpleDataTable with persistent column visibility.
Feature documentation
docs/_docs/user-guides/cueweb-user-guide.md
Adds "Monitor Hosts" to Core Features and a full section describing access, columns, numeric sorting, 30s auto-refresh, error/retry behavior, and read-only scope.
sequenceDiagram
  participant User
  participant HostsPage as HostsPage Component
  participant API as getHosts()
  participant State as hosts/error State
  participant Table as SimpleDataTable with hostColumns

  User->>HostsPage: Visit /hosts (mount)
  HostsPage->>HostsPage: Show skeleton (initial)
  HostsPage->>API: Initial getHosts()
  API-->>State: Return hosts or error
  State-->>Table: Populate rows (if hosts)
  Table-->>User: Display table

  Note over HostsPage: Every 30s
  HostsPage->>API: Poll getHosts()
  alt Poll succeeds
    API-->>State: Update hosts
    State-->>Table: Refresh table
  else Poll fails & no cached hosts
    State-->>User: Show inline error + Retry button
    User->>HostsPage: Click Retry
    HostsPage->>API: Re-fetch
  else Poll fails & cached hosts exist
    Table-->>User: Keep previous rows
  end
Loading

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 I hopped the rows at break of light,

counted cores and memory bright;
every thirty seconds new,
old rows kept when calls go blue,
hosts in order, snug and right.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR implements all core requirements from #2292: sortable table with required columns, auto-refresh every 30s, pagination support for 1000+ hosts, and substring filtering.
Out of Scope Changes check ✅ Passed All changes are directly related to the Monitor Hosts page feature. No unrelated code modifications detected.
Title check ✅ Passed The title references 'Host and Allocation Management' and 'Hosts monitor page' which accurately captures the main change—adding a new Monitor Hosts page to CueWeb with sortable, auto-refreshing host table display.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (6)
cueweb/app/hosts/page.tsx (1)

32-68: ⚡ Quick win

load and poll duplicate the same fetch/error logic.

Both functions call getHosts(), then mirror the same success (setHosts/setError(null)) and failure (setError/setHosts(prev => prev ?? [])) handling. The only difference is the cancelled guard. Consider unifying into the single load callback and reusing it inside the effect to keep the two paths from drifting.

♻️ Proposed consolidation
-  const load = React.useCallback(async () => {
-    try {
-      const data = await getHosts();
-      setHosts(data);
-      setError(null);
-    } catch (err) {
-      // getHosts already toasts via handleError; keep any previously loaded
-      // rows on a failed poll and surface an inline message only when we have
-      // nothing to show.
-      setError(err instanceof Error ? err.message : String(err));
-      setHosts((prev) => prev ?? []);
-    }
-  }, []);
-
-  React.useEffect(() => {
-    let cancelled = false;
-    const poll = async () => {
-      try {
-        const data = await getHosts();
-        if (!cancelled) {
-          setHosts(data);
-          setError(null);
-        }
-      } catch (err) {
-        if (!cancelled) {
-          setError(err instanceof Error ? err.message : String(err));
-          setHosts((prev) => prev ?? []);
-        }
-      }
-    };
-    poll();
-    const interval = setInterval(poll, REFRESH_MS);
-    return () => {
-      cancelled = true;
-      clearInterval(interval);
-    };
-  }, []);
+  const load = React.useCallback(async (isCancelled?: () => boolean) => {
+    try {
+      const data = await getHosts();
+      if (isCancelled?.()) return;
+      setHosts(data);
+      setError(null);
+    } catch (err) {
+      if (isCancelled?.()) return;
+      // getHosts already toasts via handleError; keep any previously loaded
+      // rows on a failed poll and surface an inline message only when we have
+      // nothing to show.
+      setError(err instanceof Error ? err.message : String(err));
+      setHosts((prev) => prev ?? []);
+    }
+  }, []);
+
+  React.useEffect(() => {
+    let cancelled = false;
+    const isCancelled = () => cancelled;
+    load(isCancelled);
+    const interval = setInterval(() => load(isCancelled), REFRESH_MS);
+    return () => {
+      cancelled = true;
+      clearInterval(interval);
+    };
+  }, [load]);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cueweb/app/hosts/page.tsx` around lines 32 - 68, Unify duplicated fetch/error
logic by making the existing load callback the single source of truth for
fetching hosts (it already calls getHosts and handles setHosts/setError), then
reuse load inside the React.useEffect instead of the separate poll function;
update load to accept or check a mounted/cancelled guard (e.g., a local let
cancelled or a ref like isMountedRef) so effect can call load safely and prevent
state updates after unmount, remove the poll function and its duplicated
try/catch, and keep the same failure behavior (setError(err message) and
setHosts(prev => prev ?? [])) while preserving the interval using REFRESH_MS and
clearing it on cleanup.
cueweb/app/hosts/host_format_utils.ts (2)

39-42: 💤 Low value

Consider documenting the idle > total case.

When idle > total (invalid data), the function returns a ratio greater than 1.0, which would sort at the top. This behavior is reasonable for surfacing data quality issues, but a brief comment documenting the choice could help future maintainers.

📝 Optional documentation
 // idle / total guarded against divide-by-zero; used as a numeric sort key.
+// Note: Does not clamp idle > total (invalid data); ratios > 1.0 will sort at the top.
 export function idleRatio(idle: number, total: number): number {
   if (!total) return 0;
   return idle / total;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cueweb/app/hosts/host_format_utils.ts` around lines 39 - 42, The idleRatio
function can return values >1 when idle > total, which is intentional to surface
data-quality issues; add a concise comment above the idleRatio(idle: number,
total: number): number function that states this behavior explicitly (e.g., that
a ratio >1 indicates invalid input and is surfaced intentionally rather than
being clamped), so future maintainers understand the choice and sorting
implications.

28-36: 💤 Low value

Consider guarding against negative values.

Currently, negative KB strings (e.g., "-1000") would pass through to convertMemoryToString and display as "-1000K". While negative memory indicates a data quality issue from the backend, you might prefer to return "-" to signal invalid data more clearly.

🛡️ Optional guard for negative values
 export function kbStringToHuman(kb: string): string {
   const n = Number(kb);
   // Number("") === 0 (finite), so guard the empty string explicitly to show
   // "-" (no data) rather than "0K".
-  if (kb === "" || !Number.isFinite(n)) {
+  if (kb === "" || !Number.isFinite(n) || n < 0) {
     return "-";
   }
   return convertMemoryToString(n, "host");
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cueweb/app/hosts/host_format_utils.ts` around lines 28 - 36, The function
kbStringToHuman currently converts negative KB strings (e.g., "-1000") into a
human string via convertMemoryToString; add a guard in kbStringToHuman to treat
negative numeric values as invalid and return "-" instead of passing them to
convertMemoryToString. Specifically, after parsing kb to Number (n) in
kbStringToHuman, check if n is negative (n < 0) and return "-" when so;
otherwise continue to call convertMemoryToString as before.
cueweb/app/__tests__/hosts/host_format_utils.test.ts (2)

15-24: 💤 Low value

Consider adding a test for negative values.

The current tests cover numeric, empty, and non-numeric inputs, but not negative values. Adding a test would document the expected behavior (currently displays as negative KB, e.g., "-1000K").

🧪 Optional test for negative values
   describe("kbStringToHuman", () => {
     it("formats KB strings to a human unit", () => {
       expect(kbStringToHuman("6815744")).toBe("6.5G");
       expect(kbStringToHuman("512")).toBe("512K");
     });
     it("renders a dash for non-numeric input", () => {
       expect(kbStringToHuman("abc")).toBe("-");
       expect(kbStringToHuman("")).toBe("-");
     });
+    it("displays negative values as negative KB", () => {
+      expect(kbStringToHuman("-1000")).toBe("-1000K");
+    });
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cueweb/app/__tests__/hosts/host_format_utils.test.ts` around lines 15 - 24,
Add a test case to cover negative numeric input for kbStringToHuman: in the
"kbStringToHuman" describe block add an it() that calls kbStringToHuman with a
negative value string (e.g., "-1024" or "-1000000") and asserts the expected
formatted output (e.g., "-1K" or "-1000K" depending on current behavior) so the
test suite documents and prevents regressions for negative values.

26-33: 💤 Low value

Consider adding a test for idle > total.

Adding a test for the case where idle > total (invalid data) would document the expected behavior (currently returns a ratio > 1.0).

🧪 Optional test for idle > total
   describe("idleRatio", () => {
     it("returns idle / total as a 0..1 ratio", () => {
       expect(idleRatio(4, 8)).toBe(0.5);
     });
     it("returns 0 when total is 0 to avoid divide-by-zero", () => {
       expect(idleRatio(0, 0)).toBe(0);
     });
+    it("returns ratio > 1.0 when idle > total (invalid data)", () => {
+      expect(idleRatio(10, 8)).toBe(1.25);
+    });
   });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cueweb/app/__tests__/hosts/host_format_utils.test.ts` around lines 26 - 33,
Add a unit test to document the behavior when idle > total by adding an "idle >
total" case in the describe("idleRatio") group: call idleRatio with idle greater
than total (e.g., idleRatio(8, 4)) and assert the returned value equals
idle/total (2.0) so the test documents the current behavior of the idleRatio
function rather than silently allowing an unexpected >1 result.
cueweb/app/hosts/columns.tsx (1)

65-86: ⚡ Quick win

Consider sorting consistency between cores and memory columns.

The cores column sorts by idle ratio (idleRatio(idleCores, cores)) so hosts with proportionally similar free capacity sort together regardless of size (as explained by the comment on line 68). However, the memory column sorts by absolute idle bytes (kbStringToNumber(idleMemory)), meaning a host with 100 GB idle and 10% free would sort higher than a host with 50 GB idle and 50% free.

This creates inconsistent sorting behavior: cores favor proportional availability while memory favors absolute availability.

🔄 Option 1: Make memory sort by ratio for consistency
   {
     id: "memory",
     header: sortableHeader("Memory (Idle/Total)"),
-    // Sort by idle bytes (numeric), not the formatted string.
-    accessorFn: (h) => kbStringToNumber(h.idleMemory),
+    // Sort by idle ratio so "most free" sorts together regardless of host size.
+    accessorFn: (h) => idleRatio(kbStringToNumber(h.idleMemory), kbStringToNumber(h.totalMemory)),
     cell: ({ row }) => (

Option 2: If absolute memory sorting is intentional (e.g., for job placement where absolute capacity matters more than proportional), add a comment explaining the rationale similar to the cores column.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cueweb/app/hosts/columns.tsx` around lines 65 - 86, The memory column
currently sorts by absolute idle bytes (accessorFn using
kbStringToNumber(h.idleMemory)) which is inconsistent with the cores column that
sorts by proportion (idleRatio). Change the memory column's accessorFn to use
the same ratio approach: call idleRatio(kbStringToNumber(h.idleMemory),
kbStringToNumber(h.totalMemory)) and update the comment to explain "Sort by idle
ratio so hosts with similar proportional free memory sort together"; keep the
cell rendering using kbStringToHuman(row.original.idleMemory) /
kbStringToHuman(row.original.totalMemory).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@cueweb/app/__tests__/hosts/host_format_utils.test.ts`:
- Around line 15-24: Add a test case to cover negative numeric input for
kbStringToHuman: in the "kbStringToHuman" describe block add an it() that calls
kbStringToHuman with a negative value string (e.g., "-1024" or "-1000000") and
asserts the expected formatted output (e.g., "-1K" or "-1000K" depending on
current behavior) so the test suite documents and prevents regressions for
negative values.
- Around line 26-33: Add a unit test to document the behavior when idle > total
by adding an "idle > total" case in the describe("idleRatio") group: call
idleRatio with idle greater than total (e.g., idleRatio(8, 4)) and assert the
returned value equals idle/total (2.0) so the test documents the current
behavior of the idleRatio function rather than silently allowing an unexpected
>1 result.

In `@cueweb/app/hosts/columns.tsx`:
- Around line 65-86: The memory column currently sorts by absolute idle bytes
(accessorFn using kbStringToNumber(h.idleMemory)) which is inconsistent with the
cores column that sorts by proportion (idleRatio). Change the memory column's
accessorFn to use the same ratio approach: call
idleRatio(kbStringToNumber(h.idleMemory), kbStringToNumber(h.totalMemory)) and
update the comment to explain "Sort by idle ratio so hosts with similar
proportional free memory sort together"; keep the cell rendering using
kbStringToHuman(row.original.idleMemory) /
kbStringToHuman(row.original.totalMemory).

In `@cueweb/app/hosts/host_format_utils.ts`:
- Around line 39-42: The idleRatio function can return values >1 when idle >
total, which is intentional to surface data-quality issues; add a concise
comment above the idleRatio(idle: number, total: number): number function that
states this behavior explicitly (e.g., that a ratio >1 indicates invalid input
and is surfaced intentionally rather than being clamped), so future maintainers
understand the choice and sorting implications.
- Around line 28-36: The function kbStringToHuman currently converts negative KB
strings (e.g., "-1000") into a human string via convertMemoryToString; add a
guard in kbStringToHuman to treat negative numeric values as invalid and return
"-" instead of passing them to convertMemoryToString. Specifically, after
parsing kb to Number (n) in kbStringToHuman, check if n is negative (n < 0) and
return "-" when so; otherwise continue to call convertMemoryToString as before.

In `@cueweb/app/hosts/page.tsx`:
- Around line 32-68: Unify duplicated fetch/error logic by making the existing
load callback the single source of truth for fetching hosts (it already calls
getHosts and handles setHosts/setError), then reuse load inside the
React.useEffect instead of the separate poll function; update load to accept or
check a mounted/cancelled guard (e.g., a local let cancelled or a ref like
isMountedRef) so effect can call load safely and prevent state updates after
unmount, remove the poll function and its duplicated try/catch, and keep the
same failure behavior (setError(err message) and setHosts(prev => prev ?? []))
while preserving the interval using REFRESH_MS and clearing it on cleanup.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 10cfef15-6bce-471c-a7b5-e6dab9d80d0c

📥 Commits

Reviewing files that changed from the base of the PR and between a1a1518 and 976dd2d.

📒 Files selected for processing (7)
  • cueweb/app/__tests__/hosts/host_format_utils.test.ts
  • cueweb/app/hosts/columns.tsx
  • cueweb/app/hosts/host_format_utils.ts
  • cueweb/app/hosts/page.tsx
  • cueweb/app/utils/get_utils.ts
  • cueweb/components/ui/simple-data-table.tsx
  • docs/_docs/user-guides/cueweb-user-guide.md

@ttpss930141011 ttpss930141011 force-pushed the dev-days/2292-hosts-monitor-page branch from 548a7f9 to 4dd3540 Compare June 3, 2026 21:42
@ttpss930141011

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ramonfigueiredo

Copy link
Copy Markdown
Collaborator

@ttpss930141011 Great job!

I am reviewing your code. Soon we will merge it on the master or request more changes (if necessary).

- the getHosts collapsed failures into [], so it never rejected
- The /hosts inline error + Retry banner was dead; outages showed "No hosts registered"
- Throw when accessGetApi returns non-array to distinguish failure from empty registry
- Also fixes the same latent dead error-handling in the dashboard host widgets
- Add Monitor Hosts screenshots (CueCommander menu + page, light/dark) to the user guide
- Expand the user-guide Monitor Hosts section: Columns menu, host filter; fix Memory sort note (idle ratio, not idle bytes)
- Mark Monitor Hosts (initial version) as implemented in the CueCommander menu/route lists and cross-link the section
- Add Monitor Hosts feature entry to the CueWeb overview (other-guides)
- Add a Monitor Hosts reference subsection plus the GetHosts endpoint and /api/host/gethosts proxy route
@ramonfigueiredo ramonfigueiredo changed the title [cueweb] Host and Allocation Management: Hosts monitor page (#2292) [cueweb/docs] Host and Allocation Management: Hosts monitor page - Initial version Jun 6, 2026
@ramonfigueiredo

ramonfigueiredo commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Screenshots

Light Dark
hosts-light hosts-dark
Filter Sort
hosts-filter hosts-sort

@ramonfigueiredo

Copy link
Copy Markdown
Collaborator

Thanks for your contributions to the OpenCue project, @ttpss930141011

Excellent job!

PR merged to the master 🎉

@ramonfigueiredo ramonfigueiredo merged commit 94bc190 into AcademySoftwareFoundation:master Jun 6, 2026
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants